328 PART 6 Analyzing Survival Data

Knowing When to Use Survival Regression

In Chapter 21, we examine the special problems that come up when the researcher

can’t continue to collect data during follow-up on a participant long enough to

observe whether or not they ever experience the event being studied. To recap, in

this situation, you should censor the data. This means you should acknowledge the

participant was only observed for a limited amount of time, and then was lost to

follow-up. In that chapter, we also explain how to summarize survival data using

life tables and the Kaplan-Meier method, and how to graph time-to-event data as

survival curves. In Chapter 22, we describe the log-rank test, which you can use to

compare survival among a small number of groups — for example, participants

taking drug versus placebo, or participants initially diagnosed at four different

stages of the same cancer.

But the log-rank test has limitations:»

» The log-rank test doesn’t handle numerical predictors well. Because this

test compares survival among a small number of categories, it does not work

well for a numerical variable like age. To compare survival among different

age groups with the log-rank test, you would first have to categorize the

participants into age ranges. The age ranges you choose for your groups

should be based on your research question. Because doing this loses the

granularity of the data, this test may be less efficient at detecting gradual

trends across the whole age range.»

» The log-rank test doesn’t let you analyze the simultaneous effect of

different predictors. If you try to create subgroups of participants for each

distinct combination of categories for more than one predictor (such as three

treatment groups and three diagnostic groups), you will quickly see that you

have too many groups and not enough participants in each group to support

the test. In this example — with three different treatment groups and three

diagnostic groups — you would have 3 × 3 groups, which is nine, and is already

too many for a log-rank test to be useful. Even if you have 100 participants in

your study, dividing them into nine categories greatly reduces the number of

participants in each category, making the subgroup estimate unstable.

Use survival regression when the outcome (the Y variable) is a time-to-event

variable, like survival time. Survival regression lets you do all of the following,

either in separate models or simultaneously:»

» Determine whether there is a statistically significant association between

survival and one or more other predictor variables